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Nested-error regression models are widely used for analyzing clus- 
tered data. For example, they are often applied to two-stage sample 
surveys, and in biology and econometrics. Prediction is usually the 
main goal of such analyses, and mean-squared prediction error is the 
main way in which prediction performance is measured. In this paper 
we suggest a new approach to estimating mean-squared prediction er- 
ror. We introduce a matched-moment, double-bootstrap algorithm, 
enabling the notorious underestimation of the naive mean-squared 
error estimator to be substantially reduced. Our approach does not 
require specific assumptions about the distributions of errors. Ad- 
ditionally, it is simple and easy to apply. This is achieved through 
using Monte Carlo simulation to implicitly develop formulae which, 
in a more conventional approach, would be derived laboriously by 
mathematical arguments. 

1. Introduction. Unbalanced nested-error regression models often arise 
in two-stage sample surveys, multilevel modeling, biological experiments and 
econometric analysis. Beside the noise, a source of variation is added to 
explain the correlation among observations within clusters, or subjects, and 
to allow the analysis to borrow strength from other clusters. Such nested- 
error regression models are particular cases of general linear mixed models, 
which often form the basis for inference about small-area means or subject- 
specific values. 

In this article we propose a new, nonparametric bootstrap technique for 
estimating the mean-squared error of predictors of mixed effects. The new 
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method has several attractive properties. First, it does not require specific 
distributional assumptions about error distributions. Second, it produces 
positive, bias-corrected estimators of mean-squared prediction errors. (See 
[2] and [5] for discussion of possible negativity.) Third, it is easy to apply. Al- 
though our emphasis is on small-area prediction, our methodology is equally 
useful for other applications, such as estimating subject- or cluster-specific 
random effects. 

Standard mixed-effects prediction involves two steps. First, a best linear 
unbiased predictor, or BLUP, is derived under the assumption that model 
parameters are known. Then, the model parameters are replaced by esti- 
mators, producing an empirical version of BLUP. This approach is popular 
because it is straightforward and, at this level, does not require distributional 
assumptions. 

However, estimation of mean-squared prediction error is significantly more 
challenging. The variability of parameter estimators can substantially in- 
fluence mean-squared error, to a much greater extent than a conventional 
asymptotic analysis suggests. Moreover, the nature and extent of this in- 
fluence is intimately connected to the values of the design variables and to 
properties of the two error distributions. 

In this paper we point out that, in terms of the biases of estimators of 
mean-squared prediction error, the two error distributions influence results 
predominantly through their second and fourth moments. This observation 
leads to a surprisingly simple, moment-matching, double-bootstrap algo- 
rithm for estimating, and correcting for, bias. We show that this approach 
substantially reduces the large degree of underestimation by the naive ap- 
proach. 

Kackar and Harville [20] and Harville and Jeske [18] studied various ap- 
proximations to the mean-squared prediction error of the empirical BLUP, 
assuming normality in both stages. Prasad and Rao [27] pointed out that 
if unknown model parameters are replaced by their estimators, then signifi- 
cant underestimation of true mean-squared prediction error can still result. 
This difficulty can have significant impact on policy making. To alleviate it, 
Prasad and Rao [27] constructed second-order correct mean-squared error 
estimators under normal models. Datta and Lahiri [8] extended the Prasad- 
Rao approach to cases where model parameters are estimated using max- 
imum likelihood, or restricted maximum likelihood, methods. Das, Jiang 
and Rao [6] gave rigorous proofs of these results under normality. Bootstrap 
methods in parametric settings have been suggested, for this problem, by 
Booth and Hobert [3] and Lahiri [22], for example. 

Jiang, Lahiri and Wan [19] proposed a jackknife-based bias correction 
of the mean-squared error estimator. Again, unlike the approach taken in 
the present paper, explicit parametric models are required. The problem 
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of mixed-effects prediction is in part one of deconvolution, and so conven- 
tional, nonparametric jackknife estimators of mean-squared error are not 
applicable; hence the need by Jiang, Lahiri and Wan [19] for parametric 
assumptions. For convenient implementation the methods proposed there 
also require a closed-form expression to be available for the leading term in 
an expansion of mean-squared prediction error, as a function of unknown 
parameters. The main advantage of our technique is that it does not require 
parametric assumptions about the distributions of the two sources of error 
in the model, or an analytical partition of those sources. 

On the other hand, the jackknife approach has advantages. Principal 
among these are the fact that it can be used beyond the linear setting 
treated in the present paper, for example in the case of generalized linear 
mixed models; and that, in a parametric context, related methods might 
potentially be employed to construct prediction intervals, rather than esti- 
mators of mean-squared prediction error. In the first of these settings, our 
method is confounded by nonlinear link functions. In the second, aspects of 
the error distributions, beyond just low-order moments, play necessary roles 
in constructing the prediction interval, and so again our moment-matching 
bootstrap approach is not suitable. Further discussion of jackknife methods 
in the small-area estimation problem is given by Lahiri [23]. 

An approach alternative to that given in this paper would be to estimate 
the two error distributions explicitly, and base a bootstrap algorithm on 
those estimators. However, since we wish to treat both error distributions 
nonparametrically, then the deconvolution problem would be quite nonstan- 
dard; the large literature on nonparametric deconvolution is devoted almost 
entirely to the case where one distribution is assumed known and the other 
is estimated. Early work in this area includes that of Carroll and Hall [4] 
and Fan [12, 13], and more recent contributions can be accessed through 
citations by, for example, Delaigle and Gijbels [9]. 

Identifiability of the full, doubly nonparametric deconvolution problem 
rests on the fact that some of the measurements are repeated. Methods for 
solution can be developed, for example, by starting from the approaches in- 
troduced by El-Amraoui and Goffinet [11] and Li and Vuong [26] in different 
contexts. However, in addition to the intrinsic difficulty, to a practitioner, of 
implementing a full deconvolution approach, such a technique would involve 
choosing smoothing parameters, which would have to be selected to optimize 
performance in a nonstandard problem where the target is bias reduction, 
not density estimation. By way of comparison, the bootstrap approach sug- 
gested in the present paper is simple and explicit. Only low-order moment 
estimators of the error distributions are required, and the estimators are 
directly defined as functions of the data. 
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2.1. Model. We observe data pairs (Xij,Yij) generated by the model 

(2.1) Yij=fi + X' ij /3 + Ui + s i jVij for 1 <i <n and 1 < j <n», 

where each m > 2, and /i are scalars, X y - is an r-vector, (5 is an r-vector 
of unknown parameters, the scalar Sjj is known (generally as a function of 
Xji, . . . ,Xi ni ), the UiS and Vy's are totally independent, the C/j's are identi- 
cally distributed, the V^-'s are identically distributed, E{Ui) = E(Vij) = for 
each i, j, E(Uf) = afj and .E(V^) = ciy. All inference will be conducted con- 
ditionally on X, which denotes the set of explanatory data X y - for 1 < i < n 
and 1 < j < rii . 

The model (2.1) is a generalization of the unbalanced nested-error regres- 
sion model [29, 31], and is commonly used to model two-level clustered data. 
For example, Battese, Harter and Fuller [1] and Datta and Ghosh [7] used 
this model, with = 1, for predicting the areas under corn and soybeans 
for 12 counties in North Central Iowa. Rao and Choudhry [30] studied the 
population of unincorporated tax filers from the province of Nova Scotia, 

1/2 

Canada using (2.1) with Sij = X-!\ 

Of course, (2.1) arises through noise, in terms of the Vy's, being added 
to an observation, 

e l = v + x! l (3 + u i , 

of the small-area modeling "parameter." Here 2L% = r \ l ^lj Xij- Our objec- 
tive is to make inference about estimators of the performance of predictors 
of the small-area mean 8j, or even just the random effect U% (in the case 
H = and f3 = 0). 

2.2. Formulae for predictors. Put Xi = a~ 1 J2j s ij 2 ^ij an d ^ = a 7 1 x 
J2j s ^ 2 Yij, where Oj = J2j s if- The best linear unbiased predictor of 0, is 

ef LUP = fi + xtf + Pl {Y - y. - x'p), 

where pi = crfj/(afj + a" 1 ay). Replacing \i and (5 by their weighted least- 
squares estimators, jl and (3 say, defined under the temporary assumption 
that Ojj and o~ v are known, we obtain an empirical version of Gf LUP , 

Gf LUP =fl + r i P + pi(Y i -fi- xfy . 

Here, 

/ n \ — 1 n 

\i=l / i=l 

{n \ ~ 1 n 

i=l J i=l 
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where 1, is the vector of l's of length m, Xj denotes the rxn; matrix 
with Xij as its jth column, Wj is the rij x rii matrix of which the (ji,j2)th 
component is o-f J + 5j 1 j 2 sj : j i a v , $hh ls * ne Kronecker delta, Yi is the re^-vector 
with jth component Yij, and 

/ n \ — n 

\i=l / i=l 

(n \ ~ 1 n 

i=l / i=l 

denote an r-vector and a scalar, respectively. 
A practical form of G>p LUP is 

iBLUP 



Of LUF = P + X! l (3 + p i {Y l -p- X'P), 



where ft, (3 and pi differ from ft, (3 and pi, respectively, in that afr and o~y are 



replaced by estimators, afj and a v say. We wish to construct a bias-corrected 
estimator of the mean-squared prediction error, 

(2.2) MSEj = E{(Qf LVP — @i) 2 | X}. 

2.3. Formulae for afj and a v . The estimators used here are borrowed 
from [31]. Put 

.,, t> _ Ejj^jjjj .. _ Ej s ij 2x ij _ Ej fj/jjj 

Ej s ij s u E? s ij 

Pij = s ij~i x ij ~ x i)> lij = s ij~i Y ij ~ Yi) and e ij = V%j ~ % In this notation, 

(2.4) Qij =PijP + eij, 1 < j <rii,l <i<n. 
Note too that E(eij) = and cov(eij 1 , eij 2 ) = tij 1 j 2 a v , where 

+ -a _ s jii + Si ^ 2 ~ 1 

l ihh ~ °hh -2 

Let Pj = (pn, . . . ,Pi >ni -i) be an r x (m - 1) matrix, and let P = (Pi, . . . ,P n ) 
be an r x (iV — n) matrix, where N = J2i n i- Let Qi = {in, ■ ■ ■ , %ni-x)' and 
e-i = {en,- ■ ■ , e^-i)' be (nj - l)-vectors, and let q = (q[, . . . , q' n )' and e = 
(e' 1; . . . , e' n )' be (A" — n)-vectors. Let Tj be the (n^ — 1) x (n^ — 1) matrix with 
tij 1 j 2 in position j'2), and let T be the (A'' — n) x (A 7 " — n) matrix with 
blocks Ti,...,T n down the main diagonal and blocks of zeros elsewhere. 
Bearing in mind linear relationships, the set of equations (2.4) is equivalent 
to 

(2.5) q = P'/3 + e, 
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where E(e) = and cov(e) = Ta v . 

We shall assume that N — n > r, and that the matrices T and PT -1 P' 
are both of full rank, N — n and r, respectively. Then, the sum of squares 
for error arising from the regression model (2.5) is SSEi = e'T _1 e, where 
e = q- P'/3 wls is the vector of residuals and /3 wls = (PT^P'^PT-^ is 
the weighted least-squares estimator of (5. It is shown in a longer version of 
this paper [17] that, under the full-rank conditions, 

(2.6) E(SSE 1 ) = (N -n-r)a v . 

This property motivates the estimator 

(o 7\ ^2 SSE i 

(2 - ?) av= N-n-r- 

In (2.6) and below we interpret expected value to be taken conditional on the 
set X of design variables. That is, we drop the notation a \X" used at (2.2). 

Write pij = s^Xij, q~ij = s^-Yij and = s" 1 ^ + Vij for the uncentred 
versions of pij, and e^, and put iij 1 j 2 = (s^Si^) <jy + Sj 1 j 2 ay. Let 
Pi = (pn, . . . ,Pi, ni ) be an r x m matrix, let P = (Pi, . . . , P n ) be an r x N 
matrix, let qi = (q~n, . . . , qi ni )' and e»j = {en, . . . , e^nj' be ni -vectors, and let 
q = (q[, . . . , q' n )' and e = (e[, . . . , e' n )' be ^-vectors. Let Tj be the x 
matrix with Uj 1 j 2 in position (jijjz), and let T be the N x N block-diagonal 
matrix with blocks Ti, . . . , T n down the main diagonal. In this notation, the 
model at (2.1) is equivalent to 

(2.8) q = P'P + e, 

where E{e) = and cov(e) = T. 

Assuming P is of full rank, r, the sum of squares for error arising from (2.8) 
is SSE2 = e'e, where e = q — P'/3 ols is a new vector of residuals, and (3 ols = 
(pp'^~ 1 pq i s the ordinary least-squares estimator. In a longer version of 
this paper [17] it is proved that, analogously to (2.6), 

(2.9) E(SSE 2 ) = Kafj + (N -r)a v , 
where K = Ki - K 2 , K± = J2i J2i s if an d 




E*«*« EE««^n) E^- 
i=i \j=i / \i=i j 

Property (2.9) suggests the estimator 

(2.10) <^) = max[ J &:- 1 {SSE2 - {N-r)a v },0}. 

Recall that the estimators afj and ay are substituted for afj and a v , 
respectively, in formulae for fi and (3, to obtain the estimators ft, and 0, 
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respectively. In particular, they are substituted for afj and a v in the formula 
Wj = crf/Im + °v diag(s? 1 , . . . , s 2 n .), to obtain Wj say, and we need to invert 

Wj when computing ft and /3. In some problems, a realistic discrete model 
for U and V can involve both taking the value zero with nonzero probability, 
and in this case there is a nonzero probability that W" 1 is not well defined. 
More generally, there might be concern about cases where the determinant 
of Wj is positive but close to zero. To remove these theoretical pathologies 
it is sufficient to replace SSEi by max(SSEi, S n ), where 5 n > denotes a 
small ridge parameter. See Section 4 for further discussion. 

2.4. Expansions, and analytical estimators, of MSEj . Recall the defini- 
tion of MSEj at (2.2). It can be proved that §f LUP - 6j = Aj + O p (n" 1 /2) ) 
where A.; = piVi — (1 — Pi)U{ and Vi is as at (2.3). This property suggests 
that MSEj = E(A 2 ) + 0{n~ l ). Indeed, 

(2.11) MSEj = E(A 2 ) + n~Vi(6) + 0{n~ 2 ), 

where, here and below, ipj denotes a smooth function depending only on the 
known design variables Xij and standard deviations Sij, and £i = (cfj, Oy,EXJ^, 
EV 4 ) is the vector consisting of second and fourth moments of U and V. 
See (4.2) in Theorem 1 in Section 4.1 for a rigorous formulation of (2.11). 
It is readily seen that 

(2.12) E(Aj) = ff_f 2 , 

where a, = J2j s ij 2 ^ s as m Section 2.2. Therefore, (2.11) can be written as 

(2.13) MSEj = MZo) + n~Vi(&) + 0(n" 2 ), 

where = (°~u>°~v) ana - ^0 ^ s an °tb er known, smooth function. 

From (2.13) it can be appreciated that, in order to estimate MSEj, we need 
only compute estimators of the second and fourth moments of U and V, and 
substitute them into the approximate formula, MSEj ~^/>o(£o) + n_1 V'i(£i)- 
However, this will introduce a bias of size n _1 , since if £0 = (dfj^y), then 

(2.14) E{Mio)} = MZo) + n-'MZi) + 0(n~ 2 ), 

where ip2 is a further known, smooth function. A rigorous formulation of 

(2.14) is given in (4.3) in Theorem 1. 

In fact, if we take £1 to be a vector of root-n consistent estimators of the 
respective components of £1, then, since 

(2.15) E{ij 1 (i l )} = MZ 1 ) + 0(n- 1 ), 
the estimator 

(2.16) MSEj = 0o (lo) + n"Vl(li) 
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will satisfy 

(2.17) E(M$Ei) = MSEj + n^MCi) + 0{n~ 2 ). 

We can correct for the term n _1 ^2(£i) on the right-hand side of (2.17) by 
moving it to the left, and replacing £1 by its estimator, 

(2.18) E{MSEi - n _1 ^2(|i)} = MSEj + 0(n" 2 ). 
Here we have used the fact that 

(2.19) E{MZi)} = MZi) + 0(n- 1 ). 

[Result (4.4) in Theorem 1 gives (2.15) and (2.19) under explicit regularity 
conditions.] Property (2.18) suggests a bias-corrected estimator, 

(2.20) MSE] 30 = MSEi - n -1 ^(|i), 
of MSEj, and argues that it has bias of order n~ 2 : 

(2.21) £(MSE, bc ) = MSEj + 0(n" 2 ). 

See Section 4.1 for discussion. Of course, (2.20) is motivated by the fact that 
the quantity 

(2.22) bias i = Ti- 1 ^2(|i) 

is an estimator of the bias of MSEj. 

While the estimators at (2.16) and (2.20) might be satisfactory from a the- 
oretical viewpoint, they are impractical or unattractive on several grounds. 
First, although the functions ifj± and ip2 are in principle known, they are very 
complicated functions of the Xij's and Sj^'s, and so implementing the estima- 
tors is not attractive to a practitioner. Second, the additive and subtractive 
nature of the corrections implicit in the procedures carries a risk that, in 
small to moderate samples, the estimators of MSEj will be negative. Third, 
the complexity of the functions ipi and ip2 would lead one to suspect that the 
procedures will be highly asymptotic in character. In particular, n will have 
to be quite large before reasonably unbiased estimators will be obtained. 
Taken together, these difficulties motivate development of an alternative, 
bootstrap approach, which is likely to be more attractive. The bootstrap 
algorithm suggested below uses Monte Carlo simulation to approximate the 
functions ipx and ip^, avoiding the need for explicit calculation. 

2.5. Bootstrap estimators o/MSEj. Results (2.13) and (2.14) imply that, 
in a bootstrap approach to this problem, it is sufficient from some viewpoints 
to resample from empirical "approximations" to the distributions of U and 
V that have first, second and fourth moments which are root-n consistent for 
the corresponding moments of U and V. In particular, we do not need the 
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distributions from which we resample to actually be consistent for the dis- 
tributions of U and V. This is a variant of the moment-matching, or "wild," 
bootstrap method, which almost invariably addresses first, second and third, 
rather than first, second and fourth, moments. For recent applications of the 
moment-matching bootstrap, see [10, 14, 15, 16, 21, 25, 28]. 

With this motivation, we consider the following bootstrap algorithm. 
Given Z2, z\ > with z\ < 24, let D(z2, 24) denote the distribution of a ran- 
dom variable Z, say, for which E(Z) = and E(Z 3 ) = Zj for j = 2,4. Let 
T> denote a class of such distributions, with exactly one member D(z2,Zi) 
for each pair (2:2,24). Given the estimators afj and a v at (2.10) and (2.7), 
as well as estimators ju and of ju = E(U 4 ) and — E(V 4 ), satisfy- 
ing the standard moment conditions afj < and a v < 7y , draw resamples 
U* = {U*, U*} and V* = {V*j : 1 < i < n, 1 < j < n,} by sampling inde- 
pendently from the distributions D(afj,^u) and D{dy,yv), respectively, 
the distributions being the uniquely determined members of T>. Mimicking 
the model (2.1), define 

Yi* = fit + X'^P + U* + SyV^ for 1 < % < n and 1 < j < m. 

Let Z and Z* denote the set of all pairs (Xij,Yij), and the set of all pairs 
(Xij,Y*j), respectively. Using the data in Z* , compute the bootstrap versions 

fi*, (3*, a* v , a v , %, % and §* BLUP of /}, P, &v, °V, lu, IV and §f LUP , 
respectively, and put 

(2.23) MSEj = E{(9f LVP - Q*) 2 \ Z}; 

compare (2.2). In (2.23), 6* = fi+XjP + U?. The quantity MSEj is our basic 
estimator of MSEj. We shall prove in Section 4 that it has bias of order n _1 . 

To bias-correct MSEj we use the double bootstrap, as follows. Conditional 
on W and V*, draw resamples {Uf*, . . . , U**} and {V** : 1 < i < n, 1 < j < 
rii] by sampling independently from the distributions Z){(<7[/) 2 , 7^} and 
D{(a v ) 2 ,j v }, respectively. Let 

Y** = ft* + Xljp + U** + SijV** for 1 < i < n and 1 < j < rii, 
and from the data pairs (Xy, YS*), compute the double-bootstrap version 

@**BLUP Qf gBLUP Define 

MSE* = i?{(6** BLUP - 0**) 2 \X,Z*}, 

where 6** = ft* + 2^P* + U** . Then MSE* is the direct bootstrap analogue 
of MSEj. The bias of MSEj is estimated by 



(2.24) 



biasi = E(MSE* I Z) - MSE,/, 
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and a simple bias-corrected estimator is 

(2.25) MSE^ = MSEj - bia^ = 2MSE; - E(MSE* \ Z). 

See Section 3 for discussion of other approaches. 

The bootstrap estimators biasj and MSE] 30 are analogues of the analytical 
estimators biasj and MSE] 30 , respectively, introduced in Section 2.4. We shall 
show in Section 4.2 that the bootstrap estimators have the same orders of 
accuracy as their analytical counterparts, in that 

(2.26) b!asj = biasi + O p (n~ 2 ), E(\Aasi) = £(biasj) + 0(n~ 2 ), 

(2.27) MSE* 50 = MSE> + O p {n~ 2 ), E(MSE}> C ) = E(MSE}' C ) + 0{n~ 2 ). 

2.6. Distributions DfayZ^). The simplest example of a distribution 
-D(l,p _1 ) of a random variable Z is perhaps the three-point distribution, 

(2.28) P(Z = 0) = l-p, P(Z = ±p~ 1 / 2 ) = y, 

where <p< 1. Here, E(Z) = 0, E(Z 2 ) = 1 and E(Z 4 ) = p~ l . Therefore we 

may take D(z2, z^) to be the distribution of z^ 2 Z when p = z\lz^. 

The Pearson family of distributions has the potential for fitting the first 
four moments. If (a) the first and third moments are zero, (b) the second 
is Z2 = 1 and (c) the fourth is 24 > 3, implying that tails are heavier than 
those of the normal distribution, then the Pearson family distribution is 
rescaled Student's t. The number of degrees of freedom, r, is not necessarily 
an integer, and is given by Z4 = 3(r — 2)/(r — 4). 

Section 3 reports results of a simulation study where both the three-point 
and Student's t distributions are used. While Student's t can be employed 
only when kurtosis is positive, this is the case in many practical situations. 

2.7. Estimating fourth moments of U and V . A variety of methods can 
be used; the one suggested here is based on estimating moments of residuals. 
Define 

Wi jlj2 (s,t) = s{Ui + Si^Vi^) +t(Ui + s ij2 Vij 2 ), 
to which an empirical approximation is 

W inj2 (s, t) = s(Y in - A - X'yJ) + t(Y ij2 - A - X' ij2 P). 

The average value, Wk(s,t), of W{j 1 j 2 (s,t) k , over pairs (ji,j2) of distinct 
integers 1 < ji, j'2 < TH and over 1 < i < n, is a root-n consistent estimator 
of the analogous average value, Wk{s,t) say, of E{Wij 1 j 2 (s,t) k }. Now, 

w A (l, -1) = 2a A E(V 4 ) + e 2 ^ 2 ^ ^ ^ (EV 2 ) 2 , 

}2i Tii{ni - 1) 
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where 04 = N 12 j s tj ■ This suggests the estimator 



7y = max 



(2a 4 )- 1 \ W4(l, -1)- 6 P*^i4) Z - 



Ei«i(nj - 1) 

of 7y = ^(y 4 ), which leads in turn to an estimator of ju = E(U A 



7(7 = max 



U=ij=i 

- S^oy H S 4 ~ TV US 4 f ' ^ 

i=lj'=l i=lj=l J 



3. Numerical properties. Recall, from (2.24), that the bias of u = MSE/ 

is estimated by biasj = v — u, where v = -E(MSEj | Z). The bias of u can 
be corrected in a broad variety of ways. Perhaps the simplest is to take 
u — biasj = 2u — v as our estimator of MSEj . To avoid difficulties with the sign 
of this quantity we might instead take as our bias correction u + n~ 1 g{n(u — 
£>)}, where g(t) is a smooth, symmetric, bounded function which equals t, or 
approximately t, when t is not very far 0. One approach which incorporates 
this idea is to use 

u + n~ 1 g{n(u — v)}, ifu>v, 
u 2 /[u + n~ 1 <7{n({) — u)}], if u < v. 



(3.1) 



MSE,- 



The two-case definition of MSEj ensures that this estimator is positive. The 
fact that the high-order correction, u — v, is captured inside the bounded 
function g limits the detrimental effects that stochastic variation of the cor- 
rection can have on overall variability, so removing the first drawback. 

An elementary choice, which gives very good results in practice, is g(t) = 
sgn(t) min(|t|, nc), where c is a positive constant. Perhaps surprisingly, g(t) = 
arctant also performs well. For the sake of brevity we shall report results 
only for the latter estimator, although similar performance is obtained using 
other approaches. 

In the remainder of this section we report results of a simulation study 
under the regression model (2.1). We took r = 1, fj, = 0, (3 = 1, each m = 3, 
Sij = 1 for all i and j, and n = 60 or 100; and we generated the Xjj-'s from 
the Uniform distribution on [^,1]. The objective was to estimate the mixed 
effects /U + Xj0 + U{. In problems of small-area estimation, this quantity can 
be treated as the small-area mean. 

Eight different models for the distributions of U and V were considered, in 
each case centered so that both distributions had zero mean. Variances were 
standardized so that the ratio afj/ay equaled i, 1 or 2, max(cr^,0y) = 1, 



and mm(afj,ay j 



1 



or 1. The models were Mi: U and V are both normal; 
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M 2 : U and V are both y^f; M 3 : U and V are both xh M 4 : U and V 
are both xlo'> M5: £/ and V are both exponential; Mg: U is xi and V is 
— x\\ M7: [/ and V are both Student's t^; M§: C7 and V both have logistic 
distributions. 

We used empirical measures of relative bias and coefficient of variation to 
quantify the performances of our methods for different distributions. Rela- 
tive bias of the mean-squared error estimator was defined to be the average, 
over i, of 

(3.2) RBi = £(MSE,)-SM SEi 



SMSEj 

% = 1, ...,n, where -E(MSEj) was estimated empirically as the average of 
values of MSEj over replicates. (We shall also report the average of the 
absolute values of the RBj's.) Likewise, SMSEj was defined as the average 
value of (0j — 0j) 2 over replicates. The coefficient of variation of the MSE 
estimator was taken to be the average, over i, of 

c _ {EjMSEj - SMSEj) 2 } 1 / 2 
i SMSEj 

i = 1, . . . , n, where £?(MSEj — SMSEj) 2 was computed by averaging £'(MSEj — 
SMSEj) 2 over replicates. 

Table 1 reports results in the case afj/ay = 1, and Table 2 gives results 
for crfj/cTy = 2 an d 2. For comparison, results for the "naive" mean-squared 
prediction error estimator, without any bias correction, are reported in the 
column headed RBN. The naive estimator is obtained by replacing afj and 
<jy , in the formula at (2.12), by afj and a v , respectively. 

In the problem of estimating predictive mean-squared error, the naive es- 
timator is notoriously optimistic; it is significantly negatively biased. Erring 
by giving a falsely positive impression of reliability can significantly affect the 
level of debate about policy decisions based on predictions. Ideally, bias cor- 
rection should remove much of this effect, producing estimators that tend to 
err on the side of overestimation of variance, and, against that background, 
to reduce the overall magnitude of bias. The results in Tables 1 and 2 show 
that, to a substantial extent, our bias-corrected estimator achieves this goal. 

In Table 1, the average relative bias, across all models, is small, less than 
10% and, in some individual cases, less than 5%. The three-point distribution 
tends to give lower relative bias than Student's t in the case of skewed 
distributions, although it has slightly higher coefficient of variation. For 
our method, both the relative bias and the coefficient of variation tend to 
decrease as n increases. 
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In marked contrast, the naive estimator of mean-squared error suffers 
from substantial underestimation, in the range 8%-20%. Indeed, the per- 
centages of cases where underestimation occurred, for models Mi to Ms, re- 
spectively, and for the pair (bias-corrected estimator, naive estimator), are 
(3,26), (5,56), (26,67), (20,63), (13,59), (9,29), (3,52) and (0,73), respec- 
tively. The average percentages of absolute bias, measured in terms of (me- 
dian, mean), are (12.6, 15.9) for our bias-corrected estimator, and (18.8, 24.4) 
for the naive method. All these results are for the case of moment-matching 
using the three-point distribution. 

In Table 2, to save space we give results only for models M3 and M7. It 
can be seen from those results that, for unequal variance components, both 
the relative bias and the coefficient of variation tend to take higher values, 
compared to the equal-variance case. However, the difference is not large. 
The three-point distribution, used to match moments, tends to give slightly 
better results here than the Student's t approach. 

The performance of normal-theory bias corrections applied to nonnormal 
data is well documented. For example, in the case of exponential data, use 
of normal theory can result in relative bias of 19% [27]. The extent of overes- 
timation evidenced in Tables 1 and 2 is common in work on bias-correction 
in related problems; see the simulation results of [24, 27, 32]. 

Table 1 

Relative bias and empirical coefficient of variation under different models 



n = 60 n = 100 



Model 


3pt 




t 




3pt 




t 






RB 


cv 


RB 


CV 


RBN 


RB 


CV 


RB 


CV 


RBN 


Mi 


0.088 


0.250 


0.084 


0.244 


-0.147 


0.082 


0.238 


0.078 


0.142 


-0. 


150 




0.091 


0.290 


0.100 


0.286 


-0.131 


0.098 


0.280 


0.080 


0.162 


-0. 


142 


M 2 


0.062 


0.262 


0.099 


0.253 


-0.185 


0.058 


0.247 


0.081 


0.248 


-0. 


181 




0.089 


0.289 


0.103 


0.291 


-0.187 


0.092 


0.286 


0.088 


0.274 


-0. 


142 


M 3 


0.066 


0.292 


0.097 


0.271 


-0.200 


0.040 


0.262 


0.053 


0.264 


-0. 


191 




0.095 


0.331 


0.101 


0.323 


-0.200 


0.067 


0.298 


0.048 


0.301 


-0. 


191 


M 4 


0.064 


0.272 


0.062 


0.258 


-0.125 


0.039 


0.254 


0.064 


0.258 


-0. 


103 




0.076 


0.312 


0.099 


0.305 


-0.121 


0.051 


0.279 


0.076 


0.289 


-0. 


117 


M 5 


0.088 


0.360 


0.103 


0.331 


-0.141 


0.070 


0.295 


0.090 


0.278 


-0. 


142 




0.108 


0.375 


0.111 


0.375 


-0.163 


0.079 


0.327 


0.100 


0.315 


-0. 


158 


M e 


0.006 


0.283 


0.109 


0.282 


-0.125 


0.044 


0.276 


0.080 


0.281 


-0. 


112 




0.075 


0.317 


0.121 


0.316 


-0.125 


0.064 


0.312 


0.088 


0.313 


-0. 


112 


M 7 


0.100 


0.331 


0.099 


0.287 


-0.158 


0.028 


0.262 


0.015 


0.246 


-0. 


100 




0.106 


0.376 


0.099 


0.327 


-0.166 


0.036 


0.280 


0.036 


0.268 


-0. 


115 


M 8 


0.104 


0.299 


0.065 


0.260 


-0.112 


0.093 


0.281 


0.056 


0.244 


-0. 


080 




0.100 


0.326 


0.119 


0.318 


-0.140 


0.097 


0.288 


0.066 


0.277 


-0. 


114 



The first line in each row gives median values, and the second line, means. 
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Table 2 

Relative bias and empirical coefficient of variation for models M3 

and M7 







Model 




3pt 




t 




RB 


CV 


RB 


CV 


RBN 




1 

— 2 


M 3 


0.110 


0.099 


0.308 


0.309 


-0.181 








0.103 


0.081 


0.324 


0.346 


-0.190 






M 7 


0.124 


0.100 


0.289 


0.304 


-0.135 








0.109 


0.114 


0.338 


0.344 


-0.135 


„2 lJ2 


= 2 


M 3 


0.099 


0.104 


0.310 


0.307 


-0.118 








0.112 


0.111 


0.358 


0.358 


-0.140 






M 7 


0.105 


0.081 


0.326 


0.323 


-0.116 








0.111 


0.099 


0.379 


0.366 


-0.137 



The first line in each row gives median values, and the second line, 
means. 



Finally we report on a comparison of our method with the parametric 
jackknife approach suggested by Jiang, Lahiri and Wan [19]. The latter is 
awkward to implement unless there is a closed-form expression for the lead- 
ing term in an expansion of mean-squared prediction error, as a function of 
unknown parameters. Among the models Mi -Ms, a closed- form expression 
exists only for the first (i.e., normal-normal) model. Moreover, only in this 
case does the best predictor (the small-area estimator in the case of the 
jackknife) have a closed- form expression. 

In the normal-normal model, and when afj/ay = 1 and n = 60, the me- 
dian (mean) relative bias and the median (mean) coefficient of variation are 
0.035 (0.049) and 0.262 (0.298), respectively. When n = 100 the correspond- 
ing values are 0.034 (0.047) and 0.156 (0.182). For unequal variance ratios 
the relative biases are close to these values, while the coefficients of variation 
are higher. 

Comparing these results with those in Table 1, it can be seen that the 
jackknife method, which uses full knowledge of the error distributions, per- 
forms better in terms of relative bias but is inferior in terms of coefficient of 
variation, relative to the nonparametric bootstrap method. The impact of 
deviation from normality of error distributions has been reported by Prasad 
and Rao [27] and Wang and Fuller [32]. 

4. Theoretical properties. 

4.1. Rigorous formulations of (2.11), (2.14), (2.15), (2.19) and (2.21). 
We begin by stating, and discussing, regularity conditions. Of the n^'s, s^'s, 
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Xij's and distributions of U and V we ask that: 

(a) supj rii < oo and each rij > 2, (b) C\ < < C2 for constants 
< Ci < C2 < 00 and for all i and j, (c) the vectors X^ are 
conditioned-upon values of independent copies of the random 
(a i\ r-vector X, the distribution of which is continuous and satis- 



^ ' ' fies P(||X|| < C3) = 1 for some < C3 < 00, (d) all moments of 
U and V are finite, E{U) = E(V) = and a v > 0, and (e) the 
eigenvalues of the rtj x rtj matrix Tj are bounded away from zero, 
uniformly in 1 < i < ra < 00. 

The conditions on rtj in (a) do not require any constraints on the long- 
run frequencies of different values of the n^s. As a result, the functions ipj 
in the expansions in Section 2.4, which depend on n, may not converge as 
n — > 00. However, they will converge if we assume in addition that the pro- 
portion of values of i, 1 < i < n, for which rjj takes any given value between 
2 and sup^nj, converges as n — > 00. Nevertheless, without this condition the 
functions ipj are uniformly bounded. 

Condition (c), in (4.1) and on the X^'s, can be weakened, and in par- 
ticular it is not essential to assume that the distribution of each Xij is the 
same for all i and j. However, without that constraint, more complex as- 
sumptions have to be made in order to ensure that the distributions of the 
Xjj's do not become "asymptotically degenerate" as n — > 00. If this occurs, 
then it could adversely affect assumptions made in Section 2.3 about the 
rank of the matrices P and P; those assumptions automatically hold, with 
probability 1 with respect to the process generating the Xy's, under the 
present conditions. Concerning assumption (d), it is not essential to assume 
that ay > 0. Assumption (e) is a restriction on choice of the s^'s. 

As noted at the end of Section J2.4, in general it is necessary to introduce 
a ridge parameter to ensure that Wj is nonsingular. We do this by replacing 
SSEi by max(SSEi, Bin~ B ' 2 ), for some B\ > and B2 > 2, in the definition 
of SSEi. It will be assumed below that this has been done. Depending on the 
distributions of U and V, this can slightly alter the definitions of dy and afj. 
The ridge parameter is not necessary for Theorem 1 if the distribution of V 
is absolutely continuous. 

Recall from Section 2.4 that £0 = {°ui°v)i £1 = ( a u ' a v 1 7u , Jv) and 



MCo) = crla^al/ial + a^al), where lv = E{U A ) and lv = E(V 4 ). Here, 



and in (4.2)-(4.6) below, we suppress the dependence of the functions tj;o, 
ipi and "i/>2 on i, and expectations are interpreted as conditional on X . 




Theorem 1. If (4.1) holds, then, for a class of realizations of X that 
arises with probability 1, and for k = l,2, 
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(4.3) E{Mtu,*v)} = MZo) + n-V 2 (6) + 0(n~ 2 ), 

(4.4) E{M*u,Vvnv,lv)} = Mb) + O^" 1 ), 

uniformly in 1 <i <n, where the functions ip\ and ip2 are determined solely 
by the design variables Xij and weights Sij for 1 < i < j < n, depend on n, 
are bounded in a neighborhood o/£i, and are infinitely differentiable. 

A proof of (4.2) is given in the web version of this paper [17]. Derivations 
of (4.3) and (4.4) are similar but simpler. Together, (4.2)-(4.4) imply (2.21), 

which asserts that MSEj 30 , defined at (2.20), has bias equal to 0(n~ 2 ). 

4.2. Theory for the bootstrap. In Sections 2.4 and 2.5 we discussed an- 
alytical and bootstrap-based bias corrections, respectively. In particular, 
MSE;, at (2.16), was an analytical estimator of MSEjj the associated an- 
alytical bias estimator was biasj, at (2.22); and the bias-corrected estimator 
was MSEj 30 = MSEj - bias*, at (2.20). In the same vein, MSE i; at (2.23), 
was a bootstrap estimator of MSEj; the corresponding bootstrap estimator 
of bias was bias,, at (2.24); and the resulting bias-corrected estimator was 
MSE* 50 , at (2.25). 

The effectiveness of the bootstrap approach is reflected in the fact that it 
gives a degree of correction that is identical to that provided by the analytical 
method, up to terms of order n~ 2 , as the next result shows. For definiteness 
we assume there that the moment-matching bootstrap method is based on 
the three-point distribution at (2.28); the Student's t model does not permit 
correction for negative kurtosis. We suppose too that the ridge parameter 
defined two paragraphs above Theorem 1 is incorporated into the definition 
of Oy . (This turned out not to be necessary in our simulation study, even 
though the three-point distribution has positive mass at zero. That can be 
explained by noting that the probability of difficulty being caused by the 
positive mass at zero is exponentially small, as a function of sample size, 
whereas we used only polynomially many bootstrap simulations.) 

Theorem 2. // (4.1) holds, and if the distribution at (2.28) is used to 
implement the moment-matching bootstrap, then for a class of realizations 
of X that arises with probability 1, 

(4.5) MSEi-MSEi = O p {n- 2 ), E{MSEi - MSEj} = 0(n~ 2 ), 

(4.6) biasj — biasj = O p (n~ 2 ), E'jbiasj — biasj} = O p (n~ 2 ), 
uniformly in 1 <i <n. 

Results (4.2)-(4.4) established the efficacy of the analytical approach to 
bias correction. In combination with those properties, (4.5) and (4.6) do the 
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same for the bootstrap approach, by establishing (2.26) and (2.27). A proof 
of Theorem 2 is given by Hall and Maiti [17]. 
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